-
Notifications
You must be signed in to change notification settings - Fork 3
Add bootstrapping #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #54 +/- ##
========================================
Coverage 99.53% 99.53%
========================================
Files 14 14
Lines 866 866
Branches 15 15
========================================
Hits 862 862
Misses 2 2
Partials 2 2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Interesting! Would it work for the following use case?
|
|
@oiffrig this looks very cool! On this question:
I think the answer is yes. We have functions where the bootstrapping axis isn't necessarily the same e.g.
We also have some functions that have more than 2 parameters. I give the example of
I also had a couple of thoughts/questions I thought I would throw out:
|
|
Thanks @oiffrig, really interesting! I didn't know about bootstrapping so it's always good to learn :) I'm a bit puzzled about the approach though. From what I see, the bootstrappable function that you use as an example here does not reduce the dataset in the axis direction, i.e. Couldn't you then bootstrap the result directly instead of bootstrapping the function? That would make the whole process simpler, and would then work for @Oisin-M and @martin-janousek-ecmwf cases. In order word, I feel the bootstrappable function should be the statistical function that reduces the array (here the array is the output of the metric function) in one or multiple dimensions. But those bootstrappable functions should not be the metrics that compare the experiment with the observation (CRPS, difference, bias, etc.). I always visualise these evaluation workflows in three steps:
In this case, the bootstrapping is happening at the third level, and should be a type of spatial/temporal statistics. If I understand correctly of course. Please correct me if I'm wrong! |
|
I must admit I was initially confused as my experience with bootstrapping was exactly along the lines Corentin very clearly described. But then I thought, oh, maybe the bootstrapping was meant as something even more general, like the |
|
Ah yes, I got confused with the first example. I agree actually the difference function should not be bootstrappable, and neither should the The only edge case I'm not sure about is if a function isn't deterministic, but I don't think we have such a case anyway. |
|
Yes, I tried to come up with a quick example, and got confused. Do disregard the
Not as is, but it looks like something we should be able to support. We would need support for xarray (which looks more and more like the reasonable choice to make anyway), and a looser constraint on number on inputs. Then, the updated bootstrapping would be applied on a function that takes @enable_bootstrap(dim="date")
def roca_mean(ct, roca_dim="number", mean_dim="date"):
return roca(ct, dim=roca_dim).mean(dim=mean_dim)
ct = contingency_table()
bresult = roca_mean.bootstrap(ct)
low, high = bresult.quantiles(0.05)Does that look reasonable to you? |
|
Correct me if I'm wrong, but I think we don't want to bootstrap the roca_mean function, but rather just the mean function. It's not necessary/desirable to recalculate ROCA for each bootstrapped time sample, since the ROCA doesn't touch the time axis. From what I understood, the calculation process would be 1. Get CTct = read_ct(data)
# dim:size [lon: Nx, lat: Ny, number: M, date: T]2. Calculate ROCAroca = calculate_roca(ct, dim='number')
# dim:size [lat: Nx, lon: Ny, date: T]Note: This would be bootstrappable along dimension number, but we only want to bootstrap along dimension time. Therefore, we should not be recalculating ROCA for each sampled time - it should be done once. 3. Bootstrap ROCA outputbootstrapped_roca = bootstrap_samples(roca, dim='date', sample_size=S, n_bootstraps=B)
# dim:size [lat: Nx, lon: Ny, date: S, sample: B]4. Calculate mean per timestepsample_means = calculate_mean(bootstrapped_roca, dim='date')
# dim:size [lat: Nx, lon: Ny, sample: B]5. Calculate 5th and 95 quantiles over the samplesquantiles = calculate_quantiles(sample_means, q=[0.05, 0.95], dim='sample')
# dim:size [lat: Nx, lon: Ny, quantile: 2]Sidenote: we could also bootstrap this along the dimension sample because it aggregates along that dimension. In practice, this would be pretty weird - we would essentially be trying to estimate the uncertainty in our estimated uncertainty of the mean. Assuming I haven't misunderstood things, I think in the snippet therefore it should be rather @enable_bootstrap(dim="date")
def mean(roca_vals, mean_dim="date"):
return roca_vals.mean(dim=mean_dim)
ct = contingency_table()
roca_vals = roca(ct, dim="number")
bresult = mean.bootstrap(roca_vals)
low, high = bresult.quantiles(0.05)if we want to stick with returning this |
|
But ROCA is computed from a mean (or, equally, a sum) of CTs over time. We want to collect CT at each station over a period of time and only then to compute ROCA (and other CT-based scores like PSS, ETS etc). |
|
Could it then work like |
|
Okay I see, thanks for the clarification @martin-janousek-ecmwf. I think then the snippet @oiffrig sent is the correct approach?
|
Sorry, but I still can't see this as correct. This is hinting me we first compute ROCA from station's CT and only then the mean. The order is opposite, first we sum up CTs, and only then we apply Wouldn't it rather be |
|
Thanks for the input @martin-janousek-ecmwf. I agree it's not commutative and the calculation order leads to different interpretations, but I do have some confusion around what we're trying to capture. Happy to defer to your expertise but doesn't averaging before computing ROCA lose temporal resolution? I would have though we would want to do the ROCA per timestep so that we know the full distribution of ROCA values, and then after we find the mean of that distribution. The other way around i.e computing the ROCA on the mean, would result that we only consider the "mean But couldn't we just get the distribution of the ct = get_ct()
roca_per_timestep = roca(ct) # distribution of roca values
# just directly find metrics about the distribution
mean = mean(roca_per_timestep, dim='time')
low, high = quantiles(roca_per_timestep, dim='time', qs=[0.05, 0.95]) |
|
Well, I suspect my "expertise" can be easily challenged as I am not a statistician and here I am just hiding behind the fact "we have been always doing it this way". And ROCA was probably poorly chosen as an example as both the ROCA of a single step of a single forecast and the monthly mean ROCA of a single step make sense. A different score, like the ETS computed from a "deterministic" 2x2 contingency table, would have been a better choice to make my point, as ETS is the case when one has first to accumulate CTs over some time period and only then can compute the derived statistics (ETS). That nonetheless does not change the fact more often than not we do not use an arithmetic average of "daily" scores to get their average value over a period of time (for the same lead time). We use arithmetic mean for bias, but monthly RMSE is not an average of daily RMSEs but sqrt(mean(RMSE**2)), for ACC we don't average ACCs but their Fisher-transformed values. And monthly means of ROCA, ETS, PSS etc are computed from month-accumulated CTs. And the biggest pain, the skill scores. Honestly, these are kind of a corner cases, always implemented only later on, but it may be useful to be aware of them as they can break the system. |
|
Thanks both for your input. I'll try to refine the implementation based on this feedback. Regarding skill scores, I would be keen on having a look at these use cases anyway, just to see how different they are. It would be great if we could support them out of the box! |
|
Dear all, I have now added support for xarray. I have also exposed a few functions, notably Now that we have the Also, at the moment |
This PR is meant to be a discussion around supporting generic bootstrapping. Please consider this a proof of concept tp get the discussion going.
Goal
The goal is to add a simple interface for bootstrapping score computations
Suggested implementation
This PR creates a decorator called
enable_bootstrap. When used, it turns a function into aBootstrappableobject whose__call__method wraps the function. It adds abootstrapmethod that instead perfoms bootstrapping.Example
EDIT: replaced bad
differenceexample withmseEDIT2: added more examples
Decorator
Direct call
Direct call with xarray
Questions
BootstrapResultcontains the result for every bootstrapping sample. Should we support returning quantiles frombootstrapby default and only return the full set of samples if explicitly requested?BootstrapResult?bootstrapfunction is available, do we want to keep the decorator?